HMM and CRF Based Hybrid Model for Chinese Lexical Analysis

نویسندگان

  • Degen Huang
  • Xiao Sun
  • Shidou Jiao
  • Lishuang Li
  • Zhuoye Ding
  • Ru Wan
چکیده

This paper presents the Chinese lexical analysis systems developed by Natural Language Processing Laboratory at Dalian University of Technology, which were evaluated in the 4th International Chinese Language Processing Bakeoff. The HMM and CRF hybrid model, which combines character-based model with word-based model in a directed graph, is adopted in system developing. Both the closed and open tracks regarding to Chinese word segmentation, POS tagging and Chinese Named Entity Recognition are involved in our systems’ evaluation, and good performance are achieved. Especially, in the open track of Chinese word segmentation on SXU, our system ranks 1st.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Pragmatic Chinese Lexical Analysis Based on Word-character Hybrid Model

In the field of information and natural language processing, Chinese lexical analysis is important basic step for Chinese, Japanese or other asian language. This paper presents Chinese lexical analysis integrating word-level and character-level information based on hybrid model combining word-based CRF model and latent semi-CRF model. The word-lattice, which represents all candidate outputs, is...

متن کامل

A Hybrid Approach to Chinese-English Machine Translation

A hybrid method to Chinese-English machine translation is presented, a rule-based analysis is combined with statistical data. The rule-based lexical analyzer and syntactic analyzer leave some amount of ambiguity that are resolved using statistical approach. Hidden Markov Model(HMM) is used to return a score for each parts of speech, improved probabilistic context free grammar(PCFG) is used for ...

متن کامل

A Chinese Word Segmentation System Based on Structured Support Vector Machine Utilization of Unlabeled Text Corpus

We have participated in the open tracks and closed tracks on four corpora of Chinese word segmentation tasks in CIPSSIGHAN-2010 Bake-offs. In our experiments, we used the Chinese inner phonology information in all tracks. For open tracks, we proposed a double hidden layers’ HMM (DHHMM) in which Chinese inner phonology information was used as one hidden layer and the BIO tags as another hidden l...

متن کامل

Chinese Lexical Analysis Using Hierarchical Hidden Markov Model

This paper presents a unified approach for Chinese lexical analysis using hierarchical hidden Markov model (HHMM), which aims to incorporate Chinese word segmentation, Part-Of-Speech tagging, disambiguation and unknown words recognition into a whole theoretical frame. A class-based HMM is applied in word segmentation, and in this level unknown words are treated in the same way as common words l...

متن کامل

HMM Revises Low Marginal Probability by CRF for Chinese Word Segmentation

This paper presents a Chinese word segmentation system for CIPS-SIGHAN 2010 Chinese language processing task. Firstly, based on Conditional Random Field (CRF) model, with local features and global features, the character-based tagging model is designed. Secondly, Hidden Markov Models (HMM) is used to revise the substrings with low marginal probability by CRF. Finally, confidence measure is used...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008